Task-aware deep bottleneck features for spoken language identification
نویسندگان
چکیده
Recently, deep bottleneck features (DBF) extracted from a deep neural network (DNN) containing a narrow bottleneck layer, have been applied for language identification (LID), and yield significant performance improvement over state-of-the-art methods on NIST LRE 2009. However, the DNN is trained using a large corpus of specific language which is not directly related to the LID task. More recently, lattice based discriminative training methods for extracting more targeted DBF were proposed for ASR. Inspired by this, this paper proposes to tune the post-trained DNN parameters using an LID-specific training corpus, which may make the resulting DBF, termed a Discriminative DBF (D2BF), more discriminative and task-aware. Specifically, the maximum mutual information (MMI) criterion, with gradient descent, is applied to update the DNN parameters of the bottleneck layer in an iterative fashion. We evaluate the performance of the proposed D2BF using different back-end models, including GMM-MMI and ivector, over the most confused 6-languages selected from NIST LRE 2009. The results show that the proposed D2BF is more appropriate and effective than the original DBF.
منابع مشابه
Deep Bottleneck Features for Spoken Language Identification
A key problem in spoken language identification (LID) is to design effective representations which are specific to language information. For example, in recent years, representations based on both phonotactic and acoustic features have proven their effectiveness for LID. Although advances in machine learning have led to significant improvements, LID performance is still lacking, especially for ...
متن کاملAn Investigation of Spoken Output and Intervention Types among Iranian EFL Learners
This study was inspired by VanPatten and Uludag’s (2011) study on the transferability of training via processing instruction to output tasks and Mori’s (2002) work on the development of talk-in-interaction during a group task. An interview was devised as the pretest, posttest, and delayed posttest to compare four intervention types for teaching the simple past passive: traditional intervention ...
متن کاملEnd-to-end DNN-CNN Classification for Language Identification
A defining problem in spoken language identification (LID) is how to design effective representations which allow features to be extracted that are specific to language information. Recent advances in deep neural networks for feature extraction have led to significant improvements in results, with deep end-to-end methods proving effective. In this paper, a novel network is proposed and explored...
متن کاملDeep learning for spoken language identification
Empirical results have shown that many spoken language identification systems based on hand-coded features perform poorly on small speech samples where a human would be successful. A hypothesis for this low performance is that the set of extracted features is insufficient. A deep architecture that learns features automatically is implemented and evaluated on several datasets.
متن کاملLID-senone Extraction via Deep Neural Networks for End-to-End Language Identification
A key problem in spoken language identification (LID) is how to effectively model features from a given speech utterance. Recent techniques such as end-to-end schemes and deep neural networks (DNNs) utilising transfer learning such as bottleneck (BN) features, have demonstrated good overall performance, but have not addressed the extraction of LID-specific features. We thus propose a novel end-...
متن کامل